-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init checkin to add LassoCV and RERF to optimizers #263
init checkin to add LassoCV and RERF to optimizers #263
Conversation
…e values in unit tests for BayesianOptimier
@@ -59,20 +61,47 @@ def __init__( | |||
|
|||
# Now let's put together the surrogate model. | |||
# | |||
print(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}') | |
self.logger.info(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}') | |
CategoricalDimension(name="fit_intercept", values=[False, True]), | ||
CategoricalDimension(name="normalize", values=[False, True]), | ||
CategoricalDimension(name="precompute", values=[False, True]), | ||
DiscreteDimension(name="max_iter", min=0, max=10 ** 5), | ||
ContinuousDimension(name="tol", min=0, max=2 ** 10), | ||
DiscreteDimension(name="max_iter", min=100, max=5 * 10 **3), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DiscreteDimension(name="max_iter", min=100, max=5 * 10 **3), | |
DiscreteDimension(name="max_iter", min=100, max=5 * (10 ** 3)), | |
@@ -89,6 +91,10 @@ def __init__( | |||
self.partial_hat_matrix_ = 0 | |||
self.regressor_standard_error_ = 0 | |||
|
|||
# THE HACK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to explain a little more here. If I remember right:
When LassoCV is used as part of RERF, it cannot reasonably compute the upper and lower bounds on its input space dimensions, as they are a polynomial combination of inputs to RERF. Thus, it approximates them with the empirical min and max. These approximations are biased: the lower bound is too large, the upper bound is too small. Consequently, during scoring, LassoCV is likely to see input outside of these bounds, but we still want LassoCV to produce predictions for those points. So we introduce a little hack: whenever LassoCV is instantiated as part of RERF, it should skip input filtering on predict. This field, controls this behavior.
Feel free to just copy-paste that in, or polish it to your liking!
# add small noise to x to remove singularity, | ||
# expect prediction confidence to be reduced (wider intervals) by doing this | ||
self.logger.info( | ||
f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10^10." | ||
f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10^4." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10**4
f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10^4." | |
f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10**4." | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's clear what you mean... but my CDO strongly suggests that we should stick to the Python exponentiation operator :)
|
||
|
||
class MultiObjectiveLassoCrossValidated(NaiveMultiObjectiveRegressionModel): | ||
"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective. | |
"""Maintains multiple LassoCrossValidatedRegressionModels each predicting a different objective. | |
) | ||
|
||
|
||
# We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space. | |
# We just need to assert that the model config belongs in lasso_cross_validated_config_store.parameter_space. | |
|
||
|
||
class MultiObjectiveRegressionEnhancedRandomForest(NaiveMultiObjectiveRegressionModel): | ||
"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective. | |
"""Maintains multiple RegressionEnhancedRandomForestRegressionModel each predicting a different objective. | |
) | ||
|
||
|
||
# We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space. | |
# We just need to assert that the model config belongs in regression_enhanced_random_forest_config_store.parameter_space. | |
for output_dimension in output_space.dimensions: | ||
print(f'output_dimension.name: {output_dimension.name}') | ||
lasso_model = LassoCrossValidatedRegressionModel( | ||
model_config=model_config, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You copy the model_config in multi-objective RERF, but not here. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values in the model config are altered by the random forest GridSearchCV for the RERF. When these configs are assigned to different objectives, they stomped all over each other. I'll track down the lines in RERF model that alter the model_config and explain this in the MultiObjectiveRERF code where you've spotted this difference.
# TODO : determine min sample needed to fit based on model configs | ||
random_forest_should_fit = True | ||
return root_base_model_should_fit and random_forest_should_fit | ||
# since polynomial basis functions decrease the degrees of freedom (TODO: add reference), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is neat :)
num_testing_samples = 10 | ||
elif objective_function_config_name == '5_mutually_exclusive_polynomials': | ||
num_training_samples = 100 | ||
num_testing_samples = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_testing_samples = 50 | |
num_testing_samples = 50 | |
else: | |
assert False | |
num_testing_samples = 10 | ||
elif objective_function_config_name == '5_mutually_exclusive_polynomials': | ||
num_training_samples = 100 | ||
num_testing_samples = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_testing_samples = 50 | |
num_testing_samples = 50 | |
else: | |
assert False | |
Added LassoCrossValidated (LassoCV) and RegressionEnhancedRandomForest (RERF) regression models to the list of surrogate models available for optimizers. This required creating MultiObjective versions for each of these regression models. Fixed some bugs found via testing with random surrogate_model parameters. Details below by file added/changed:
source/Mlos.Python/mlos/Optimizers/BayesianOptimizerConfigStore.py:
source/Mlos.Python/mlos/Optimizers/BayesianOptimizer.py:
source/Mlos.Python/mlos/Optimizers/RegressionModels/LassoCrossValidatedConfigStore.py:
source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveLassoCrossValidated.py:
New class to allow LassoCV for multi-objective optimizations.
source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveRegressionEnhancedRandomForest.py:
New class to allow RERF for multi-objective optimizations.
Note: the .copy() on line 41 is needed b/c model_config.perform_initial_random_forest_hyper_parameter_search value is changed (True -> False) once grid search completes for the random forest fit().
source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveRegressionEnhancedRandomForest.py:
source/Mlos.Python/mlos/Optimizers/RegressionModels/unit_tests/TestMultiObjectiveLassoCrossValidated.py:
New unit tests for new class.
source/Mlos.Python/mlos/Optimizers/RegressionModels/unit_tests/TestMultiObjectiveRegressionEnhancedRandomForest.py:
New unit tests for new class.